Skip to content

AST-148815 : Add CLAUDE.md with architecture and contract notes#20

Closed
cx-anurag-dalke wants to merge 2 commits into
mainfrom
feature/AST-148815
Closed

AST-148815 : Add CLAUDE.md with architecture and contract notes#20
cx-anurag-dalke wants to merge 2 commits into
mainfrom
feature/AST-148815

Conversation

@cx-anurag-dalke

Copy link
Copy Markdown
Contributor

Adds a CLAUDE.md at the repo root documenting the Parser interface / ParsersFactory dispatch model and how to wire a new ecosystem.
Captures per-parser quirks (Maven multi-line Locations, npm lockfile fallback, pypi ==-only scope, dotnet variants).
Documents invariants downstream AST-CLI depends on: 0-based line numbers, literal "latest" for unresolvable versions, stable PackageManager strings (mvn/npm/pypi/go/nuget).
Notes the 60% CI coverage floor.

cx-anurag-dalke and others added 2 commits April 20, 2026 21:27
Captures the Parser/factory dispatch model, per-ecosystem quirks,
and the invariants (0-based line numbers, "latest" sentinel,
PackageManager strings) that downstream AST-CLI relies on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

Logo
Checkmarx One – Scan Summary & Detailsf3b51698-d956-4cc3-99f0-fe01c74be679

Great job! No new security vulnerabilities introduced in this pull request

@cx-atish-jadhav cx-atish-jadhav left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this started — the content that's here is accurate and well-grounded in the code (verified the Parser interface signature, ParsersFactory, Package/Location structs, the 9e490aa commit reference, the 60% CI floor, and the Go 1.23 / toolchain 1.24.2 pins all match the repo).

Leaving inline comments for consistency with the JIRA epic template (AST-148815 parent). The epic lists these essential sections: Project Overview, Architecture, Repository Structure, Technology Stack, Development Setup, Coding Standards, Project Rules, Testing Strategy, Known Issues, External Integrations, Deployment, Performance, API/Interfaces, Security & Access, Logging, Debugging Steps. This PR covers Project Overview, Architecture, Project Rules (as "Invariants"), and Testing Strategy well — the inline comments point out the missing / thin sections plus a few small accuracy / clarity nits. Some sections (DB schema, deployment) are genuinely N/A for a parser library; recommend adding a one-liner saying so rather than omitting, so every repo's CLAUDE.md has a predictable shape.

Comment thread CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Overview

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to ## Project Overview to match the JIRA epic template. The epic also asks for "Purpose and status" — please add a one-liner on status (active / maintained / experimental / deprecated).

Comment thread CLAUDE.md

Go module that parses package manifests from multiple ecosystems (Maven, npm, Python, Go, .NET) and returns each declared dependency along with the **exact line/character range** of its declaration. Consumed by [AST-CLI](https://github.com/Checkmarx/ast-cli) to correlate manifest entries with Checkmarx runtime scans — so the `Locations` field is part of the public contract, not a debugging convenience.

## Commands

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The epic asks for a ## Development Setup section. Consider renaming this and adding: (a) prerequisites (Go ≥ 1.23, git), (b) a note that dependencies are vendored so no go mod download is required, (c) an example invocation against a fixture, e.g. go run ./cmd test/resources/pom.xml. Right now a new contributor has to guess what a valid input path looks like.

Comment thread CLAUDE.md
go test -run TestName ./path/... # run a single test by name
go test ./... -coverprofile cover.out # CI gate: total coverage must be >= 60%
go build -o manifest-parser ./cmd # build CLI
go run ./cmd <manifest-file> # run CLI against a manifest

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider showing a truncated sample output (a Package JSON snippet) so readers know what success looks like without opening cmd/main.go.

Comment thread CLAUDE.md
go run ./cmd <manifest-file> # run CLI against a manifest
```

Dependencies are vendored (`vendor/`). Go version is pinned via `go.mod` (1.23 / toolchain 1.24.2).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please promote this into a dedicated ## Technology Stack section (one of the epic's essential sections) listing: Go 1.23 / toolchain 1.24.2, github.com/stretchr/testify v1.8.4, golang.org/x/mod v0.24.0, stdlib encoding/xml + encoding/json. Explicitly state "no database" and "no web framework" so the N/A sections are unambiguous.

Comment thread CLAUDE.md

Dependencies are vendored (`vendor/`). Go version is pinned via `go.mod` (1.23 / toolchain 1.24.2).

## Architecture

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a ## Repository Structure section (or subsection) with the top-level folder tree: cmd/, pkg/parser/, internal/parsers/{maven,npm,pypi,golang,dotnet}/, test/resources/, vendor/. The epic lists "Repository Structure — Folder organization" as its own essential section.

Comment thread CLAUDE.md
Per-ecosystem parsers live under [internal/parsers/](internal/parsers/):
- `maven/` — parses `pom.xml` with `encoding/xml`, then re-scans the raw text to locate each `<dependency>` block line by line. Resolves `${property}` vars from `<properties>` and falls back to `<dependencyManagement>` for empty/ranged versions. Only **direct** `<dependencies>` are emitted (managed-only deps are intentionally skipped to avoid duplicates — see commit `9e490aa`).
- `npm/` — parses `package.json` plus, if present as a sibling file, `package-lock.json` (v1 and v2/v3 formats). Ranged specifiers (`^`, `~`, `*`, `>`, `<`) trigger a lookup in the lockfile; `isLockVersionGreater` compares part-by-part numerically to decide whether the lockfile version satisfies the spec. Without a lock match, ranged versions resolve to `"latest"`.
- `pypi/` — line-oriented scan of `requirements*.txt` / `packages*.txt`. **Only `package==version` is supported** — `pip freeze`, Poetry, and pip-tools output are explicitly out of scope (see README "Known Limitations"). Comments (`#`) and environment markers (`;`) are stripped.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pypi ==-only limitation belongs in a dedicated ## Known Issues / Limitations section per the epic. Please consolidate there: pypi == only (no pip freeze / Poetry / pip-tools), npm ranged versions without a lockfile resolve to "latest", Maven managed-only deps not emitted, etc. Scatter-and-gather hurts discoverability.

Comment thread CLAUDE.md
- `golang/` — uses `golang.org/x/mod/modfile` to parse `go.mod`, then uses the parser's line metadata to compute character offsets.
- `dotnet/` — three separate parsers sharing patterns: `csproj_parser.go` (`.csproj`), `directory_packages_props_parser.go` (central package management), `packages_config_parser.go` (legacy). Versions are read from either a `Version` attribute or a nested `<Version>` element; bracketed ranges become `"latest"`.

### Invariants worth preserving

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The epic names this section "Project Rules — Don'ts and constraints". Consider renaming (or using both: ### Project Rules (Invariants)) for cross-repo consistency — the standardization goal is about predictable headings across repos, not just content.

Comment thread CLAUDE.md

### Invariants worth preserving

- **`Location` uses 0-based line numbers** in most parsers (Maven, Go, npm, pypi use `lineNum - 1` or a 0-based counter). Downstream AST-CLI depends on this; don't "fix" it to 1-based without coordinating.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two ambiguities in Location worth clarifying: are StartIndex / EndIndex 0-based or 1-based, and are they byte offsets or rune/character offsets? AST-CLI callers need to know to render the annotation correctly for non-ASCII manifests.

Comment thread CLAUDE.md

## Tests & fixtures

Each parser has a `*_test.go` next to it using `testify`. Shared fixtures live in [test/resources/](test/resources/) (e.g. `pom.xml`, `package.json`, `requirements.txt`, `test_go.mod`, `Bootstrap.csproj`, `Gateway.csproj`, `packages.config`, `Directory.Packages.props`). When adding behaviors, add a fixture here rather than embedding large manifests in test source.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add: (a) how to view coverage locally (go tool cover -html cover.out), (b) the expected pattern for a new parser (fixture under test/resources/ + *_test.go co-located with parser using testify), (c) any naming convention for fixtures. These answer "how do I add a test?" which is the epic's intent for this section.

Comment thread CLAUDE.md

Each parser has a `*_test.go` next to it using `testify`. Shared fixtures live in [test/resources/](test/resources/) (e.g. `pom.xml`, `package.json`, `requirements.txt`, `test_go.mod`, `Bootstrap.csproj`, `Gateway.csproj`, `packages.config`, `Directory.Packages.props`). When adding behaviors, add a fixture here rather than embedding large manifests in test source.

CI ([.github/workflows/ci.yml](.github/workflows/ci.yml)) enforces a **60% total coverage floor** — adding an untested branch to an already-thin package can push the whole repo below the gate.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing sections from the epic template — please add, even as one-liners for the N/A ones, so every repo's CLAUDE.md has a predictable shape:

  • External Integrations — consumed by AST-CLI; Locations field + PackageManager strings (mvn/npm/pypi/go/nuget) are load-bearing downstream.
  • Deployment — N/A (library; consumed via go get github.com/Checkmarx/manifest-parser).
  • Database Schema — N/A.
  • Performance Considerations — Maven re-scans raw XML after encoding/xml parse (two passes); no streaming; large pom.xml files load fully into memory.
  • Security & Access — parsers consume untrusted manifest files. Note XXE posture (encoding/xml doesn't resolve external entities by default — worth stating explicitly) and whether there's a file-size bound.
  • Logging — CLI uses log.Fatalf on error; library returns error and does not log. Callers should not expect library log output.
  • Debugging Steps — how to run one parser against one fixture, how to set -v for verbose tests, common failure mode (location off-by-one → check 0-based invariant).
  • Coding Standardsgofmt / go vet clean; exported identifiers in pkg/, internal logic in internal/; parser packages follow <ecosystem>/<ecosystem>_parser.go + <ecosystem>_parser_test.go layout.

cx-alon-rosenhek pushed a commit that referenced this pull request Jun 29, 2026
* feat: add Gradle parser feature branch

* feat: update Gradle parser and add implementation plan

* updated gradle-parser

* added support for gradle  libs.versions.toml

* Add CLAUDE.md with architecture and contract notes

Captures the Parser/factory dispatch model, per-ecosystem quirks,
and the invariants (0-based line numbers, "latest" sentinel,
PackageManager strings) that downstream AST-CLI relies on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* updated readme file

* docs: expand CLAUDE.md to meet JIRA epic template requirements

Addresses all inline review comments from PR #20 review:

- Rename Overview → Project Overview, add status line
- Add Technology Stack section (Go 1.23, testify, x/mod, stdlib, no DB)
- Add Repository Structure section with folder tree
- Rename Commands → Development Setup; add prerequisites, clone step,
  coverage HTML command, and sample JSON output
- Add API / Interfaces section with full struct definitions including
  clarification that StartIndex/EndIndex are 0-based byte offsets
- Update Architecture to include Gradle parser (missed in original)
- Replace commit hash reference (9e490aa) with PR #15 link
- Rename Invariants → Project Rules (Invariants); add PackageManager
  string for gradle and the StartIndex/EndIndex byte-offset clarification
- Rename Tests & fixtures → Testing Strategy; add fixture tree, coverage
  HTML command, and expected new-parser pattern
- Add Known Issues / Limitations section (consolidates pypi, npm, maven,
  dotnet, and all-parsers limitations)
- Add External Integrations section (AST-CLI contract fields)
- Add Deployment section (N/A — library, not a service)
- Add Performance Considerations section (Maven two-pass, Gradle catalog,
  no caching)
- Add Security & Access section (XXE posture, no file-size limit, no
  network calls)
- Add Logging section (library vs CLI behaviour)
- Add Coding Standards section (gofmt/vet, pkg vs internal, naming)
- Add Debugging Steps section (5 concrete steps)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* Add SBT (Scala Build Tool) manifest parser support

Implement a production-grade SBT parser that extracts dependencies from all
.sbt files (build.sbt, plugins.sbt, dependencies.sbt, etc.). The parser
supports val/lazy val/def variable declarations, all SBT operators (%, %%,
%%%), Seq blocks, addSbtPlugin syntax, dependency modifiers (exclude,
excludeAll, intransitive, withSources, withJavadoc, cross, classifier),
block and inline comments, scope annotations, dependencyOverrides, and
duplicate package detection.

Includes 29 unit tests at 97.8% coverage with test fixtures containing
known-vulnerable packages (Log4Shell, Jackson, Struts2, commons-collections,
SnakeYAML) for security scanning validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix gradle 0-based line contract and harden parsers for downstream IDE integration

- gradle: emit 0-based line numbers (was off-by-one, broke IDE decorations)
- gradle: default empty catalog versions to "latest" (was causing 400 from realtime-scanner)
- gradle: multi-line dependency locations with rawLines tracking
- maven/dotnet/golang: strip trailing \r so byte offsets are correct on CRLF files
- CLAUDE.md: strengthen 0-based contract, add SBT to parser list
- plugins.sbt: add known-vulnerable packages so IDE decorations can be visually verified

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Merge Python parsers and fix Windows CRLF EndIndex bug

Integrated Poetry, Setuptools (setup.cfg/setup.py), and enhanced PyPI parsers
from Sumit's implementation with existing Gradle/SBT support. All Python
parsers return PackageManager="pypi" per design spec. Added testdata fixtures
and comprehensive test coverage.

Key changes:
- Added internal/parsers/{poetry,setuptools}/ with full test suites
- Enhanced internal/parsers/pypi/ with support for 6 Python formats
- Fixed pre-existing CRLF line ending bug affecting golang, dotnet, maven on Windows
- Updated manifest-file-selector.go with SBT, Poetry, and Setuptools routing
- Updated parser_factory.go with dispatchers for all Python ecosystems
- Updated CLAUDE.md with complete architecture and design pattern documentation

All parsers pass unit tests. Manifest-parser builds and runs successfully.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* gradle and sbt wildcard support

* adding sha for checkout actions

* Using correct sha for cehckout actions in release.yml

* log removal and minor fixes

* Support Android flavor/buildtype dependency configurations in Gradle parser

Fixes AST-160218. The Gradle parser previously matched only a hardcoded
list of configuration keywords, missing all dynamically-generated Android
variants: freeImplementation, paidDebugImplementation, debugApi, kaptTest,
etc. These are auto-generated by the Android Gradle Plugin from product
flavors and build types and contain real dependencies that need scanning.

Changed configKeywords from exact-match to suffix-based regex covering:
- *Implementation, *Api, *CompileOnly, *RuntimeOnly (any flavor/buildtype)
- *Kapt, *Ksp, *AnnotationProcessor (test/androidTest scopes)
- classpath, lintChecks (exact, no variants)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* Changes test data location for pypi-parser tests

* review comments implemented

* replaced action with gh command

---------

Co-authored-by: cx-anurag-dalke <120229307+cx-anurag-dalke@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants